Thanks to the No Child Left Behind Act, annual testing in math and reading for students in grades three through eight became mandatory in every state beginning in 2005. Fifteen years later, a wealth of testing data has enabled researchers to build and analyze longitudinal models of student achievement, such as the one forming the basis of a recent CALDER working paper from Dan Goldhaber, Malcolm Wolff, and Timothy Daly.
They employ datasets from Massachusetts, North Carolina, and Washington, all of which have historical test score data, along with information on student characteristics and three specific long-term outcomes. While most other predictive studies stopped at eighth grade, Goldhaber and colleagues were able to track sixteen cohorts of students from third grade through high school, enabling them to investigate how accurately early measures of student achievement predict later outcomes, namely high school test scores, advanced course-taking, and graduation.
Researchers controlled for student characteristics such as race, gender, disability status, English language learner status, free and reduced-price lunch eligibility, and enrollment status in special education. Instead of simply using cut scores or hard test scores, they produced estimates by creating more flexible specifications, such as the decile of test score achievement in the third grade, and they examined the interactions of these deciles with different student characteristics. Importantly, students in the three states exhibited similar patterns of achievement on the initial third grade tests.
The topline finding was a strong correlation between a student’s place in the third-grade test distribution and that youngster’s performance on high school math tests. Moreover, there were consistent and strong relationships between third grade math test scores and each of the high school outcomes of interest. For example, the poorest performing students (those who scored in the lowest decile on the third grade math test) scored 48–54 percentile points lower in the high school math test distribution, were 45–50 percent less likely to take an advanced course, and 11–21 percent less likely to graduate. Soberingly, these relationships held even when researchers omitted eighth grade test performance, suggesting that third grade performance sets the tone for a student’s entire school career. Still, adding eighth grade scores into the model increased the predictive power while not changing the predictions themselves.
Race and poverty were strongly predictive of a student’s academic trajectory. Though this might not be news, the amount of academic inertia that this paper reports between third grade and high school is particularly worrisome. Even students who performed in the top percentiles on third grade tests were less likely to maintain that performance level as time went on if they met certain criteria. Notably, free-lunch-eligible students who performed at the top decile on the third grade math test were only about as likely to graduate from high school as non-eligible students scoring in the second decile. In short, socioeconomically-disadvantaged students who show academic prowess early on are not guaranteed to continue on that trajectory, as might be the case with their more affluent peers. And other research corroborates these concerning findings.
Looking to determine whether these predictive results held across states, the research team found that using student achievement data and parameters from one state as the basis for predicting students’ educational outcomes in another state did not substantially reduce forecast accuracy. This increased their confidence in the predictive power of their modeling.
Testing students annually and using these results to inform policy decisions (like school accountability, for example) has been a major federal strategy for two decades. Such test results are often also used as diagnostic tools for educators and parents to identify individual student needs, and can serve as a sort of “warning system” to identify pupils who need academic support. Yet this study suggests that neither of these goals is best served by the testing frameworks currently in place. Most testing occurs at the middle and high school level. If it’s true that third grade test scores are strongly predictive of vital high school outcomes, both middle school testing and any intervention that arise from it are seemingly far too late to help students and schools who need it the most.
Still, this report reinforces the truth that test scores offer vital information about students’ academic achievement now and in the future (a fact that will play no small part in the looming decision to keep or cancel 2021 standardized tests). Whether or not these warning signs will be recognized in light of a dangerous antipathy toward testing—and whether intervention will begin early enough to overcome their predictive power—are open questions. While test naysayers push forward, the research indicates that vulnerable students’ academic progress will stall.
SOURCE: Dan Goldhaber, Malcolm Wolff, & Timothy Daly, “Assessing the Accuracy of Elementary School Test Scores as Predictors of Students’ High School Outcomes,” CALDER Working Paper No. 235-0520 (May 2020).